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1. Introduction 


The analysis of C-sample designs in the presence of stratification is a problem frequently 
faced by practitioners. 

In the industrial field a variety of stratified analysis scenarios present themselves. Take, 
for example, a company that wishes to assess the performance of three different formulas for 
a new dishwasher detergent. Multiple dishwashers are used and multiple washes are carried 
out. At the end of each wash, an expert provides an evaluation of the cleaning performance of 
the formula. When analyzing the resulting data, the effect of using one dishwasher instead of 
another cannot be ignored, so each dishwasher is considered to be a separate stratum. Likewise, 
in the healthcare field it is quite common for multiple drugs to be tested on patients of different 
age groups. Each age group is again considered to be a stratum. 

In this paper we focus on a scenario from the field of education. We are interested in 
assessing how the performance of students from different degree programs at the University of 
Padova changes, in terms of university credits and grades, when compared with their entrance 
exam results. In other words, we want to assess whether people who achieved the best results 
in this exam perform best during their academic career. 

The entrance exam can have three possible outcomes (i.e. it is an ordinal variable). This is 
therefore a typical stochastic ordering problem (Basso et al., 2009; Basso and Salmaso, 2011; 
Bonnini et al., 2014), that is a problem in which the main interest lies in evaluating the null 


hypothesis Y; £...4Y against the alternative hypothesis Y; < Pa E Yo and Efy(Y1)] < 

. < Ely(Yc)], where at least one inequality is strict, and y(-) is an increasing function 
(Pesarin and Salmaso, 2010). Our aim is in fact to assess whether by comparing increasing 
entrance exam outcomes, the C = 3 corresponding distributions of the student’s performance 
measure Y are stochastically ordered. 

A few nonparametric methods have been proposed in the literature to address these prob- 
lems. Among them, Jonckheere’s test (Jonckheere, 1954; Terpstra, 1952) is one of the first non- 
parametric solutions to test for ordered alternatives and is based on use of the Mann-Whitney 
test (Mann and Whitney, 1947) to perform all the possible [C x (C — 1)]/2 pairwise compar- 
isons between C groups. Neuhduser et al. (1998) also proposed a modification of this test that 
appears to be more powerful than the original test with small sample sizes (Shan et al., 2014). 
Additionally, permutation-based solutions involving the Non-Parametric Combination (NPC) 
technique (Pesarin and Salmaso, 2010; Klingenberg et al., 2009; Finos et al., 2007, 2008) were 
introduced. 

We propose a further extension of the NPC technique to address stochastic ordering prob- 
lems in the presence of stratification. Indeed, the impact of the student’s choice of degree 
program cannot be ignored, therefore stratification must be considered in the testing procedure. 

In section 2 we are going to describe the proposed permutation-based approach. In section 
3 we apply it to the case study of interest related to university education. Finally, section 4 
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provides the results and conclusions. 


2. Methodology 


Firstly, let us further describe the stochastic ordering problem. The main interest lies in 
evaluating the system of hypotheses: 


H: Y, = ahve 


d d 
AL: Yı . < Yo and at least one strict inequality <, 


lAs lls 


d EE O E EAT d ; ; ; 
where the symbol = denotes equality in distribution and < denotes stochastic dominance, i.e. 


d 
Yı < Y if and only if Fi(z) > Fo(z),Vz and SI: Fi(z) > Fo(z),z € I with Pr(I) > 0, 
where F} is the cumulative distribution function. An alternative way to write this is: 


mene aes (1) 


Ay: Fi > Fy > +++ > Fe-1) = Fe and at least one strict inequality. 


NPC-based solutions generally consider a particular decomposition. The hypotheses are 
split in order to recreate the conditions of a set of two-sample problems as follows: 


= i=l, on far (Ai = +++ = Fy) = (Fenny = + = Fo) 
=1 -t 
TO Ha = Ui (F =: =F) > (Fen = = Fe). 


where the null hypothesis Ho is the intersection of a number of partial hypotheses and the 
alternative hypothesis H; is the union of C — 1 sub-hypotheses. 

For each pair of sub-hypotheses Hig and Hj,, the first ¿ and the last (C — i) samples are 
pooled, so that two new samples X, and Xə are achieved, with sizes N and M. The sub- 
problem can therefore be rewritten as: 


Hio * Xi a Xo 
Ha : Xı é Xo. 


Each sub-hypothesis is then tested separately, using appropriate permutation tests. The adopted 
test statistic can differ according to the nature of the data, but a common and versatile choice is 
the modified version of the Anderson-Darling test statistic: 


= As) — Fy(X;)|/{ F(X) [1 — F(X;)|}? (2) 


where X = { X,, X2} is the pooled sample, F(t) = ay (X; < t)/N, P(t) = Da (X; < 
t)/M, F(t) = X}; I(X; < t)/n,n = N + M, t € R? and I(-) is the indicator function which 
is 1 if (-) is satisfied and 0 otherwise. 

According to the NPC algorithm (Pesarin and Salmaso, 2010), B permuted datasets are 
independently generated for each sub-problem and the related values of the test statistic Tý, b = 
1,..., B are calculated to simulate the null distribution of T. Partial p-values (\;) and à}, b = 
1,..., B estimating their distributions can therefore be achieved. It is worth noting that the 
same permutation design is adopted for each sub-problem, to implicitly take into account the 
existing dependency among sub-problems. 


A combination step now needs to be performed. The partial p-values A; i = 1,...,C — 1 
related to the C — 1 sub-problems {Hijo vs Ha} are combines using an adequate combining 


function, such as Fisher’s combining function T} = —2 - Sam, "log(A;). The same is done for 
each of the B vectors à}, i = 1,...,C’ — 1. The elements of the new resulting vector represent 


the second-order test statistics, from which it is finally possible to achieve the global p-value A” 
to assess the system of hypotheses 1. 
Given that stratification needs to be included, we propose firstly applying this procedure to 
each of the S strata, testing S systems of hypotheses: 
oe ee (3) 


His : Fis = Fos = +++ > Fic-1)s = Fos and at least one strict inequality. 


After applying the aforementioned NPC-based approach to each stratum, the global p-values 
As", Vs =1,...,5 (and the \%,” estimating their distributions) are thus retained. Then we adopt 
a further combination step, using the Fisher combining function, and retrieve a final p-value A”. 
In this way, by comparing \’” to the desired significance level a, we are able to solve the global 
stochastic ordering problem Ho vs H. 

Given that multiple systems of hypotheses H,o vs H31,Vs = 1,...,S are assessed, we 
then apply an appropriate multiplicity correction to control the false discovery rate (FDR). Our 
choice is the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995). 


3. A case study 


Let us now focus on the real stratified C-sample problem at hand. As mentioned before, 
we are interested in evaluating the performances of students from different degree programs at 
the University of Padova. In particular, we want to understand if the university credits gained 
at the end of the first year (Y°), the credits gained at the end of the third year (Y°) and the 
final average grade (Y°) somehow depend on the results achieved by the student in the entrance 
exam. In other words, we try to indirectly assess the efficacy of this exam in evaluating and 
selecting future students. The analysis is performed using R (R Core Team, 2020). 

Let us briefly describe the data. The total sample size is 3083 students. Firstly, the degree 
programs are grouped into 4 classes (identified by their Italian subject titles): 


e ING_INFORMAZIONE_NON_PROFES (S1) 
e ING- CIVILE AMBIENTALE L7 (S2) 
e INGINFORMAZIONE L8 (S3) 
e ING INDUSTRIALE L9 (S4). 
The different classes represent different strata (i.e. S = 4) and have different sample sizes (see 
Figure 1). The variable reporting the outcome of the entrance exam has three modalities (i.e. 
C = 3), namely INSUFFICIENTE, SUFFICIENTE and PIU’ CHE_SUFFICIENTE (Insuffi- 
cient, Sufficient and More Than Sufficient). For the sake of simplicity, we are going to refer to 
them as INS, SUF and PIU in our notation. In Figure 1, the possible outcomes are ordered from 
worst to best. 
For each response variable Y/,Vj € {a,b,c}, we want to assess if Yf < é yes é Yguy» with 
at least one strict inequality, taking into account the effect of the degree program class. 
Looking at credits gained at the end of the first year, a first descriptive analysis (see Figure 
2) appears to support the alternative hypothesis. Indeed, in all strata, students achieving INS at 
the entrance exam appear to perform worse than students achieving SUF, and students achieving 
PIU at the entrance exam tend to perform better than students achieving SUF. 
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Similar conclusions can be drawn about both credits gained at the end of the third year (see 
Figure 3) and the average grade at the end of the academic career (see Figure 4). 

Applying our testing procedure, we managed to confirm these hypotheses. We set B = 
10000 and used the test statistic in Equation 2 and Fisher’s combining function. When look- 
ing at Y° (see Table 1), all the partial p-values and the global p-value proved to be substan- 
tially smaller than 1%. The only exceptions were ING_CIVILE_AMBIENTALE L7 (S2) and 
ING_INFORMAZIONE L8 (S3), for which the descriptive analysis shows that the order among 
entrance exam outcomes is less evident. 
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Figure 1: Description of the sample. 
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Figure 2: Credits at the end of the first year. 
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Figure 3: Credits at the end of the third year. 
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Figure 4: Average grade at the end of the academic career. 


Table 1: Table of p-values for Y°, Y° and Y°. 


| Response | Global | S1 S2 S3 S4 
ye le-4 | le-4 | le-4 le-4 | le-4 
y? le-4 | 2e-4 | 0.1471 | 0.1185 | 2e-4 
Ye le-4 | 4e-4 | 2.9e-3 8e-4 | 6e-4 


4. Conclusions 


In this paper we presented a new solution to C-sample stochastic ordering problems in the 
presence of stratification, focusing on its application to a case study from the field of education. 
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Our proposal takes advantage of the Non-Parametric Combination (NPC) procedure (Pe- 
sarin and Salmaso, 2010), a versatile permutation-based methodology allowing us to solve 
several different complex problems, such as stochastic ordering. We apply this technique to 
evaluate the presence of stochastic ordering in each of the S existing strata and then use an 
appropriate combining function to assess the stochastic ordering in all the samples. 

The application of this procedure allowed us to assess the efficacy of the University of 
Padova’s entrance exams in evaluating and selecting future students. Indeed, it emerged that 
students with the worst results in the entrance exam tended to perform the worst during their 
academic career, in terms of both university credits achieved at the end of the first and third years 
and in terms of the final average grade, independently of the chosen degree program. The only 
exception was people from ING_CIVILE_AMBIENTALE_L7 and ING INFORMAZIONE L8. 
For these two strata, when the credits at the end of the third year were considered, it was not 
possible to find enough evidence in favor of the stochastic ordering hypothesis. 

Overall, this approach appears to be significantly promising and a simulation study has been 
planned to further explore its performances. 
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